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1. Introduction 

Producing fault-free devices such as computer processors so costly that only a few large 
companies can afford building and running new facilities. But even devices known 
to be fully working initially may fail a posteriori. Fault-tolerant computing tries to 
minimize the consequences of component failure by designing computer systems that 
continue to operate satisfactorily even in the presence of faults The majority of 
fault-tolerant designs involves partitioning a computer system into modules that act as 
fault-containment regions. Redundancy of these modules is then considered, so if one 
fails others can assume its function, optimizing reliability availability or efficiency. 

While redundancy is expensive, components known to be imperfect are classified 
as useless and become cheap if not free, even though they can still be of some use. For 
instance, some devices with minor defects are still profitable, as faulty memory chips 
in answering machines j2|. Another example is the massive parallel computer Teramac 
|3j, designed with devices with unknown status but connected with adaptive wiring so 
as to avoid the defects. A third strategy was presented in jl] by one of us, where it was 
noticed that devices are most often only partly defective and therefore one may combine 
them in such a way that their imperfections cancel. This is the essence of the Defect 
Combination Problem (DCP), which applies to both analog and binary components. 
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While the analog problem was already addressed mathematically in the aim of the 
present paper is to solve its binary counterpart. 

The paper is organized as follows: in the following section, section El we present the 
problem and discuss briefly how to treat it by tools and concepts of statistical mechanics 
of disordered systems. The canonical ensemble approach is used in section 01 to analyze 
the typical properties of the model. In section |31 we compare the analytical work with 
numerical simulations. Section |31 is devoted to the flux recycling problem. 

2. Model definitions 

We assume that each device is able to perform P different functions, numbered 
by n = 1, ■ ■ ■ , P. The manufacturing process is such that each function is either 
permanently defective with probability or working with probability 1 — (f)-t- The DCP 
consists in extracting from an ensemble of N devices a subset such that the defects 
compensate optimally. 

More precise, let us denote with Ising variables G { — 1, +1} whether the function 
fi of the component i is defective (^f = —1) or not {^^ = 1). This means that the 
manufacturing process is summarized by 

Pie) = + 1) + (1 - mit - 1) , (1) 

which assumes that the state of the functions is determined independently at the time 
at which the device is being made.§ To identify whether a component i belongs to a 
specific subset we introduce the boolean variables ai G {0, 1} such that if ai = 1 the 
component belongs to a given subset of zero otherwise. Every possible subset out of 
the ensemble is fully determined by a vector cr = (ai, . . . , ctat). The binary DCP is 
defined as the search for a combination such that the majority of its components gives 
the correct answer for all the functions, in which case 

N 

J2ea,>0, V/x = l,...,P. (2) 

i=l 

Conditions Q are also called majority vote in the fault-tolerant computer literature. 
A simple inspection of the set of inequalities ^ indicates that a phase transition is 
expected. Indeed, it is clear that when N ^ P there is a large number of subsets out 
of the possible 2^ satisfying the above conditions. When P increases the number of 
these subsets decreases and finding configurations that satisfy the majority vote Q is 
increasingly difficult; at some point finding perfect subsets is not possible any more. 
Therefore, one expects the existence two phases: a fault-free one where perfect subsets 
exist {a = P/N < Oc) and a imperfect one where conditions (0) « > where the best 

I This is not unrealistic: the quahty of an electronic chip is determined by the local level of impurities 
of the silicon wafer from which it is made. Local fluctuations of the impurities density can cause a 
function to be defective. 

§ This is akin to assume that the local fluctuations of impurity density in the wafer example are 
uncorrelated. 
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one can do is minimizing the number of unsatistifed conditions, that is, the number of 
faulty functions, denoted by C. 

We define C(cr), the number of unsatisfied conditions in subset cr as 

N 

''cj. ) , (3) 



where k is a confidence threshold (k = corresponds to the majority vote). If one 
rewrites the default distribution function as 

ner) = ^5(e + l) + ^^5(e-l), l = m/^ (4) 

the similarity between the binary DCP and the optimal capacity problem of Ising neural 
networks OEl is evident. More precisely, the binary DCP is equivalent to the optimal 
capacity problem with J = 0, 1 synaptic couplings and biased patterns introduced in 

mil- 

Whereas any combination can be considered in the above problem, the constrained 
DCP restricts the choice of combinations to those comprising a fixed number X^ili (^i of 
components. The technological justification for this is that an actual implementation of 
the DCP would be made easier by building in advance boards designed for receiving a 
fixed number of components. 

3. Canonical approach 

We shall proceed similarly as in GDI- In the canonical ensemble the typical 
properties of the unconstrained DCP are fully described by the partition function 

^(/?) = Ee-^'^'^^ (5) 

cr 

with (3 = 1/T the inverse temperature. The free energy / then reads 

1 



/(/?) = -Jmro;^«l°g^^/^)))^ 



where (([■ ■ -J))^ denotes the average with respect to -P(^). We are interested in the zero 
temperature limit where the free energy / corresponds to the fraction of erroneously 
implemented functions whilst the entropy, defined as 

= a/5 , (7) 

II It seems however that there is an inconsistency in 7 in the way the order parameters scale with 
the system size that has been unnoticed. FoUowing their notation, the presence of bias in the patterns 
imphes that one must introduce the order parameter {^/ VN)J2iLi '^i' '^tiile the diagonal part of 
the spin glass overlap gives J" , thus having essentially the same parameter but scahng 

differently with the system size. It may be even possible that this was the source of discrepancy 
between different models in the sparse coding limit [Tj |S1 |Hj ■ A forthcoming work will address this 
question in details jl4j. We note here that this inconsistency is cured as soon as one rescales the bias 
parameter with the system size as in (0J). 
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is the logarithm of the number of solutions to the DCP. Nevertheless it is interesting 
to point out the behavior of this model with the temperature, similar to the Random 
Energy Model fOlE]- In the faulty- free regime {a < ac) the entropy is positive at any 
temperature while the free energy vanishes at zero temperature since there are perfect 
subsets. In this regime the Replica Symmetric (RS) approximation is indeed a very good 
approximation (if not exact) at any temperature. The critical point ac coincides with 
the cancellation of the entropy at zero temperature, because there are no more perfect 
subsets. However for a > ac there exists a critical temperature Tc, called freezing 
temperature, below which RS is broken (the entropy becomes negative). A One-Step 
Replica Symmetry Breaking (RSB) calculation reveals that for T < Tc the RS entropy 
becomes zero while the RS internal energy (fraction of errors) freezes to its value at Tc 
for T < Tc%. 

Starting from the expression (jH)) and following standard procedures jlS] we write 
the free energy as 

/(« = -JL-.'.'i55}^i<>s«^"(/')))e. (8) 

where we have used the replica approach based on the equivalence ((logZ)) = 
lim„_^o(((^")) ^1)/''^; consisting in substituting the logarithm appearing in the equation 
(jH)) by an object much easier to average over the disorder. The n-th power in the 
partition function indicates that the same system has been replicated n times, thus the 
name of replica. After some straightforward manipulations, the replicated and averaged 
partition function becomes 



,Af(Gi+G2+G3) 



where we have defined the following macroscopic order parameters 



1 ^ 

1=1 



1 ^ 



and with the functions Gi, G2 and G3 given by 
1 " 

a,/3=l 

dhndh. 



(9) 



(10) 



(11) 



G2 = a log 



n 

Lo=l 



27r 



exp i ^ haha + im^^ haMa 

\ a=l a=l 



~9 E ^ahpqap 
a,/3=l 



log exp 



n 



+ (1 



-^)Q{hc 



a=l 



(12) 



(13) 



a,/3=l 



% This discussion is based on a calculation in the canonical ensemble. One can also use the 
microcanomical ensemble as in in which a RS calculation is equivalent to the One-Step RSB 
calculation in the canonical one. 
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In the thermodynamic hmit (A^, P — oo at fixed a = P/N) the expression (jU)) is 
evaluated by the steepest descent method and the free energy / simply reads 

m = - H ^^^^^{....?..}(^^ + ^2 + ^3) . (14) 

Within the RS ansatz the overlap parameters with two replica indexes are assumed to 
be invariant under the interchange of all replica indexes. We then write 

g„/3 = M6ap + g(l - 6ap) (15) 

qaf3 = M6al3 + qil- 6al3) . (16) 

Evaluation of the free energy /(/5) and the entropy s{P) in the RS ansatz gives 
fcOT = - J-(MA? + ,8 - -1/aiog (l + .-^-'^] 



I) 

and 



^I^tlog 



e"^ + (1 - e~^)H 



' K + mM + tyg^ 



(17) 



K+mM+t,Jq 



e + {I e )n y ^/jt^ ) 

with Dt = dte-^^/^/V2n and H{x) = edc{x/V2)/2, edc{x) being the complementary 
of the error function. The previous free energy (fTTj) must be stationary with respect 
to M, M, g, q. For the constraint model, stationary with respect to M must not be 
imposed, and M becomes a parameter that controls the relative size of the subset. The 
saddle-point equations read 

M= [ Dt(l + e^^'^) (19) 



1 + 



t 



Dt ^ (20) 

M + q , , P~- 

1 + e^+*v 9 



;i - e-o^,M (^±^^1 

+ (1 - e-^)^ (- 



M=-2a\ Dt : : '"Z I (21) 



:i - e-^if,, (- 



with 



2a I Dt— _lvv^ME|^_ (22) 

/ K + mM + t./q\ 1 \k + t./q - m(M - 2q)] {^+mM+t^f 

V ^/M-q J 2 ^2n{M - qf 

^ U + mM + t^ \ ^ _ l [tM + ^{K + mM)] _ (f±|^££±£vl)! 

J 2 h^q^M - g)3 ' • I ^ 



We remark that equation (|2ip is not present in the constraint case. We will now study 
the different regimes. 
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3.1. Faulty-free regime, a < 

At zero temperature the fraction of errors f[p6) becomes zero in this regime, while 
the entropy s(oo) is different from zero indicating that there exists perfect subsets. 
We have solved the saddle-point equations (fT^ -(P ^ numerically at zero temperature 
(analytically this limit is calculated trivially) and for different values of the parameters 
a,m, At a = the entropy is simply s = log(2), that is, there are 2^ perfect 




Figure 1. Unconstraint case. Entropy s-ng^m, k, a) versus a. Left panel: k = and 
m = 1 K = a-1.0 (left to right). Right panel k = 0.2 and m = 1.0, 0.0 and -1.0 (left 
to right). 



combinations. When a increases, the relative number of inequalities Q increases and 
the number of perfect subsets decreases accordingly, diminishing the entropy as well. 
Note that as long as the entropy is finite, there is still an exponential number of perfect 
combinations. This behavior appears in figure ^ where we have plotted the entropy 
against a for different values of m and n for the unconstraint case (the constraint case 
presents a similar behavior). We have also plotted how the typical size M varies with 
a. It presents typically a monotonic behavior depending on the value of m, but there 
is an interval where M is non- monotonic (see inset in figure IS}- A naive explanation 
of this monotonic behavior would be as follows: let us first assume that with the same 
probability we may find defects or not (m = 0). At a = 0, there are no constraints, 
hence all combinations have the same probability to be perfect and therefore M = 1/2. 
As a increases it becomes more difficult to find perfect subsets and larger subsets are 
less likely to satisfy the majority vote. Consequently, the average size M is reduced as 
a increases. Now for fixed a and as m increases there are more defects and the large 
subsets are even less likely to satisfy the set of constraints ^ and consequently the 
average size M becomes more reduced. For negative m the opposite effect is observed 
for obvious reasons. 
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3.2. Critical regime, a = Oc 

From figure ^ we see that the critical point is reached when the entropy becomes zero, 
i.e. there exists no more perfect subsets but one. We also studied the behavior of the 




Figure 3. Unconstraint case. Left panel: ac versus biased parameter m for different 
values of the threshold k — 0.0,0.2,0.4 and 0.6 (from top to bottom). Right panel: 
The typical critical size Mc of the subset as a function of m for different values of the 
threshold k — 0.0, 0.2, 0.4 and 0.6 (from bottom to top). 



system at a = ac- The zero entropy condition (at zero temperature) gives the following 
equation for 



ar 



^{MM + qq)+ / Dt log ( 1 + exp 



M + q , R- 



X 



Dt log if 



'K + mM + ty/q^ 



(25) 
(26) 
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Adding this to the previous saddle-point equations (fT^ - (j^ and solving them 
numerically allows the study of the critical behavior for both the unconstraint and 
constraint case. Figure 01 reports for the unconstraint case and figure ID for the 
constraint one. This figure can be used in principle in order to determine N in order to 
be in the fault-free phase, since P is given by the component and m by the manufacturing 
process. 




0.1 0.3 0.5 0.7 0.1 0.3 0.5 0.7 

M M 

Figure 4. Constraint case. Left Panel: ac versus relative size M for k = and for 
m = —0.3,-0.1,0.0,0.1 and 0.3 (top to bottom). Right Panel: ac versus relative size 
M for TO = and for n = 0.0, 0.2, 0.4 and 0.6 (top to bottom) 



3.3. Imperfect regime, a > ac 

In this regime the entropy decreases with the temperature until T <Tc where it becomes 
zero, with the freezing temperature. The fraction of errors is held constant in this 
interval, i.e. f{(3c) = /(oo). The freezing temperature Tc is therefore given by the zero 
entropy condition s{f3c) = and the fraction of errors / then reads 

f = e-^^ [ Dt ^^^^tVUt^ (27) 

Adding then this condition to the saddle-point equations (fTIJ|) - (P^ fixes the temperature 
to Tc and their numerical solution allows to evaluate the expression ()27|1 for the fraction 
of faulty functions. Notice that for a = ac the freezing temperature = and therefore 

/ = o. 

Figure El shows the fractions of errors /(oo) versus a for different values of k, and 
m, while in figure El we have fixed a and plotted the fraction of faulty functions against 
the subset size M. 
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Figure 5. Unconstraint case. Left panel: Fraction of faulty functions / as a function 
of a for m = 0.0 and k — 0.0, 0.2 and 0.4 (right to left). Right panel: Fraction of faulty 
functions / as a function of a for k = 0.0 and rn = —1.0,0.0 and 1.0 (right to left). 
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Figure 6. Constraint case. Fraction of faulty functions / as a function of M for 
m = 0.0, a = 1.0 and k = 0.0,0.2 and 0.4 (bottom to top). 



4. Numerical simulations 

We carried out extensive numerical simulations in order to check the above theoretical 
results. As in many similar neural network models, finite size effects are problematic 
in the binary DCP IT^ . Figure [7| plots ac versus m both for theoretical results, 
and numerical simulations. Fixing at 20, we enumerated all the combinations of 
components: starting with P = 1 we added patterns, i.e. increased P, increasing 
thereby the set of disorder until P = P* for which no perfect combination can be found. 
The estimate of the critical point a^N) = P^/N is (P* - 1)/A^ > > PVA^, which 
gives lower and upper bounds for a^, noted af and respectively. The agreement 
between theory and simulations is qualitatively good as long as m does not take large 
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Figure 7. Phase transition point ac versus m. Tlreoretical results (continuous line), 
upper bound (down triangles) and lower bound (up triangles); = 20, average 
over 10000 samples. Inset: finite-size analysis of (solid down triangles) and af (up 
triangles) as a function of and for m = 0; average over 10000 samples. 



negative values. We checked that the discrepancy between the numerical simulations 
and the theory is probably a finite size effect (see inset of figure |71 for m = 0): fitting 
a^{N) first without imposing any asymptotic value ^^(oo), i.e. with a^{N) = aN^ + c 
yields a>{N) = 9.194A^-2.093 ^ q 537 ^j^j^ g^^^^g g 423^ o.330 and 0.006 respectively. 

The error on a is very large, while that on c gives a surprisingly precise estimate of 
the theoretical value = 0.58976 . . .. Fitting our data with a>{N) = aN^ + 0.5898 
gives a>{N) = 11.602A^-2-204 + 0.5898 with much smaller errors of 2.178 and 0.075 
respectively. The lower bound barely increases with N and stays at around 0.55.. 
Many other quantities have the same kind of finite size scaling as , as, for instance, 
the fraction of used components Mc at ac (figure |HI) and the fraction of faulty functions 
/ in the optimal combination (figure IHI). A notable exception is that of the fraction 
of faulty functions in the constrained case. The integer nature of the problem causes 
notable variations depending on M and A^. Despite relatively large finite size effects, 
the numerical simulations confirm the validity of the theory. 

5. Flux recycling 

The DCP studied above is static in nature, and does not address the whole complexity 
of component recycling, as in real life manufacturers produce a fiux of faulty devices. 
How to recycle a fiux is therefore a relevant problem. Let us start with some simple 
theoretical considerations. The central quantity of interest is the average quality of the 
components in the optimal combination (whose sense will be defined below), defined as 
the fraction of working functions of the components included in a combination cr, i.e. 

g(^) = 1 f 1 + ^^^2^1^] (28) 
' 2 V PNM{(t) j ^ ' 
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Figure 8. Fraction Mc of used components in tlie optimal combination at Uc- N — 20, 
average over 10000 samples. Inset: finite-size analysis of Mc for m = 0; average over 
10000 samples. 
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Figure 9. Fraction / of working functions as a function of a. N — 20, average over 
10000 samples. Inset: finite-size analysis of / at a = 0.5; average over 10000 samples. 

Note that equation ^ implies that the quahty of a working combination is always 
greater than 1/2 if k is fixed to 0. 

Let us assume that we have A^^o components initially and that A^^o and k are fixed so 
as to be in the fault-free region {a < etc) • In the following, we shall neglect fluctuations. 
The typical fraction of working functions is wq = q{{l, ■ ■ ■ , 1}) = (1 — m/ \/l%)/2. If we 
now remove the optimal subset ctq with, on average, A^^oMq components of quality go, 
we have a new ensemble of A^^i = A^o(l ~ ^o) components with quality 

wi = {wo - Mogo)/(l - Mo) . (29) 
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Figure 10. Fraction of worlcing functions w in the set of imperfect components (red 
line) and quality q of the optimal perfect combination (left graph: black line; right 
graph: circles) in dynamical flux recycling for P = 10, = 18 and rn = —1.5 (left 
graph) and m ~ (right graph). 



If new A^o-^o fresh components taken from the flux of imperfect components are added, 
one has instead that the next iteration has A^^i = A'^o(l — Mq + Fq) components with 
Wo - Moqo + FqWo 
= l-Afo + Fo 

and 

Generahzing this equation to the n-th step yields 
»»« = 1 - M„ + F„ 

and 

Flux recycling requires that the trajectory of a„ and Wn stays in the fault-free region. 
The flux problem can be solved at constant Nn = N, that is, F„ = M„, in which case 
Eq (jH^ becomes 

Wn+l = Wn + Mn{wQ - Qn) • (34) 

whereas = «n- We propose two main ingredients. First of all, in the faulty-free 
region, there is an exponentially large number of perfect combinations; which one is it 
best to select? In a static view, the one with the least number of components is the 
most economical. However, as suggested by equation g„ should be minimized so 
as to make more probable that Wn+i does not decrease as a function of time, which 
would inevitably lead the system out of the faulty-free region. Therefore, we define the 
optimal perfect combination as the one with the smallest q. A remarkable consequence 
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Figure 11. Phase diagram for m — Q (continuous line), —0.5 (dashed line) and — 1 
(dot-dashed hne) 



of this choice is that this actually increases Wn beyond wq, as expected from the above 
discussion, and hence ensures that a perfect combination is found at each time step if 
ao is sufficiently far from ac (see figure El) • Note as well that no component is wasted, 
hence the efficiency of this recycling scheme is 100%. If a is smaller but close to ac, k can 
be adjusted dynamically (i.e. lowered if needed) to compensate for adverse fluctuations 

of Wn- 

When a is either close or above ac and Nn and k are kept constant, a new ingredient 
is needed. If no perfect combination is found, a simple but effective idea is to replace 
the worst component by a fresh one, until a perfect combination can be found. This 
keeps the recycling process going on forever, and makes it possible even for a > ac- 
The price to pay is that some of the worst components will be wasted. Interestingly, 
the value of Wn such that perfect combinations with average quality q can be found is 
entirely determined by a, i.e. independent from m. If we start at a > ac, eliminating 
the worst components increases w until w c:^ q (see figure ITUI) . Note however that a > ac 
can usually be avoided by lowering k, unless the manufacturing process is really poor. 
Figure ITT] shows what n to choose for given a and m. 

6. Summary and conclusions 

In this paper we have solved the binary DCP at One-Step of RSB. The system is 
characterized by a phase transition — similar to Random Energy model fT] or the 
Gardner capacity problem with Ising couplings ^] — from a faulty-free regime with 
an exponential number of perfect subsets to an imperfect regime where no perfect 
combinations are available. We have contrasted our analytical findings with extensive 
numerical simulations based on exact enumeration. Even though they present strong 
finite size effects as in other models ITT)] , they show the validity of the theory 
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qualitatively, but we can not rule out that in some regions further steps in the RSB 
are needed. 

We have also addressed the dynamic problem of flux recycling and have proposed 
efficient methods that lead to no wastage at all. 
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